iJoin: Importance-Aware Join Approximation over Data Streams

نویسندگان

  • Dhananjay Kulkarni
  • Chinya V. Ravishankar
چکیده

We consider approximate join processing over data streams when memory limitations cause incoming tuples to overflow the available space, precluding exact processing. Selective eviction of tuples (loadshedding) is needed, but is challenging since data distributions and arrival rates are unknown a priori. Also, in many real-world applications such as for the stock market and sensor-data, different items may have different importance levels. Current methods pay little attention to load-shedding when tuples bear such importance semantics, and perform poorly due to premature tuple drops and unproductive tuple retention. We propose a novel framework, called iJoin, which overcomes these drawbacks, and also provides tuples a fair chance in being part of the join result. Our load-shedding scheme for iJoin maximizes the total importance of join results, and allows reconfiguration of tuple-importance. We also show how to trade off load-shedding overhead and approximation-error. Our experiments show that iJoin has the best performance, and is practical.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GreedyDual-Join: Locality-Aware Buffer Management for Approximate Join Processing Over Data Streams

We investigate adaptive buffer management techniques for approximate evaluation of sliding window joins over multiple data streams. In many applications, data stream processing systems have limited memory or have to deal with very high speed data streams. In both cases, computing the exact results of joins between these streams may not be feasible, mainly because the buffers used to compute the...

متن کامل

The CLOCK Data-Aware Eviction Approach: Towards Processing Linked Data Streams with Limited Resources

Processing streams rather than static files of Linked Data has gained increasing importance in the web of data. When processing data streams system builders are faced with the conundrum of guaranteeing a constant maximum response time with limited resources and, possibly, no prior information on the data arrival frequency. One approach to address this issue is to delete data from a cache during...

متن کامل

Join Size Estimation Over Data Streams Using Cosine Series

In many applications, data takes the form of a continuous stream rather than a persistent data set. Data stream processing is generally an on-line, one-pass process and is required to be time and space efficient too. In this paper, we develop a framework for estimating join size over the data streams based on the discrete cosine transform (DCT). The DCT generally can provide concise and accurat...

متن کامل

Memory-Limited Execution of Windowed Stream Joins

We address the problem of computing approximate answers to continuous sliding-window joins over data streams when the available memory may be insufficient to keep the entire join state. One approximation scenario is to provide a maximum subset of the result, with the objective of losing as few result tuples as possible. An alternative scenario is to provide a random sample of the join result, e...

متن کامل

Relaxed Queries over Data Streams

Relaxation skyline queries have been proposed, in the relational context, as a solution to the so-called empty answer problem. Given a query composed of selection and join operations, a relaxation skyline query relies on the usage of a relaxation function (usually, a numeric function) to quantify the distance of each tuple (pair of tuples in case of join) from the specified conditions and uses ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008